4 research outputs found
Efficient Deep Image Denoising via Class Specific Convolution
Deep neural networks have been widely used in image denoising during the past
few years. Even though they achieve great success on this problem, they are
computationally inefficient which makes them inappropriate to be implemented in
mobile devices. In this paper, we propose an efficient deep neural network for
image denoising based on pixel-wise classification. Despite using a
computationally efficient network cannot effectively remove the noises from any
content, it is still capable to denoise from a specific type of pattern or
texture. The proposed method follows such a divide and conquer scheme. We first
use an efficient U-net to pixel-wisely classify pixels in the noisy image based
on the local gradient statistics. Then we replace part of the convolution
layers in existing denoising networks by the proposed Class Specific
Convolution layers (CSConv) which use different weights for different classes
of pixels. Quantitative and qualitative evaluations on public datasets
demonstrate that the proposed method can reduce the computational costs without
sacrificing the performance compared to state-of-the-art algorithms.Comment: The Thirty-Fifth AAAI Conference on Artificial Intelligence(AAAI-21
Self-Supervised Intensity-Event Stereo Matching
Event cameras are novel bio-inspired vision sensors that output pixel-level
intensity changes in microsecond accuracy with a high dynamic range and low
power consumption. Despite these advantages, event cameras cannot be directly
applied to computational imaging tasks due to the inability to obtain
high-quality intensity and events simultaneously. This paper aims to connect a
standalone event camera and a modern intensity camera so that the applications
can take advantage of both two sensors. We establish this connection through a
multi-modal stereo matching task. We first convert events to a reconstructed
image and extend the existing stereo networks to this multi-modality condition.
We propose a self-supervised method to train the multi-modal stereo network
without using ground truth disparity data. The structure loss calculated on
image gradients is used to enable self-supervised learning on such multi-modal
data. Exploiting the internal stereo constraint between views with different
modalities, we introduce general stereo loss functions, including disparity
cross-consistency loss and internal disparity loss, leading to improved
performance and robustness compared to existing approaches. The experiments
demonstrate the effectiveness of the proposed method, especially the proposed
general stereo loss functions, on both synthetic and real datasets. At last, we
shed light on employing the aligned events and intensity images in downstream
tasks, e.g., video interpolation application.Comment: This paper has been accepted by the Journal of Imaging Science &
Technolog